Storing and retrieval method

ABSTRACT

The invention relates to a technique for utilizing a general purpose computer controlled by an algorithm to store items from a multi-dimensional space in a store in such a way that retrieval of all items within a given tolerance of a search point can be effected rapidly. Basically, it is the use of unique algorithms in coordinating the function of the general purpose computer and correlator that enable the rapid processing to be achieved. In effect, it is a between limits hashing operation.

United States Patent Batcher [45] Aug. 1, 1972 [54] STORING AND RETRIEVAL METHOD 3,568,155 3/ 1971 Abraham et al. ..340/ 1 72.5 [72] Inventor: Kenneth E. Batcher, Stow, Ohio Prim y Ga th D sh a xammerre aw v Asslgnee! y" Aerospace Corporation, Assistant Examiner-Melvin B. Chapnick Akron, Ohm Attorney-J. G. Pete and L. A. Germain [22] 'Filed: Sept. 4, 1970 [57] ABS CT [21] Appl. No.: 69,831

I The invention relates to a technique for utilizing a general purpose computer controlled by an algorithm [52] U.S.Cl ..444/l to store items from a multidimensional space in a [51] ll. 15/34 store i Such a y that retrieval of all items within a [58] Field 0 Search "340/ 343/ 5 given tolerance of a search point can be effected rapidly. Basically, it is the use of unique algorithms in [56] References cued coordinating the function of the general purpose com- UNITED STATES PATENTS puter and correlator that enable the rapid processing 3 73 129 9/1966 M n at a! 340/1725 to be achieved. In eflect, it is a between limits hashing u ery ti 3,521,277 7/1970 Evans ..343/5 0pm 3,531,775 9/1970 lshii .;....340/l72.5 7 Claim, 6 Drawing Figures l 84 a5 86 a? 88 fla l 73 74 75 76 g 77 78 2d 63 64 as 66 67 53 5 55 56 57 Y l 52 i E ed 42 43 '44- 45 4s a l 6d 3| 32 33 34 35 3e 4d 2 I 2 2 23 24 25 l 2d I0 I I l2 l3 l4 I5 I o OO O I O2 O3 O4 i l I PATENTEDNB" Hen 3.681.781

SHEET 2 OF 2 START i k -k+| 2 i i NO h 1 YES YES

i 2 i i I J J+| V I I H X b n? J I 2 KJ F'N] YES FINISH INVENTOR KENNETH E. BATCHER ATTORNEYS STORING AND RETRIEVAL METHOD BETWEEN-LIMITS HASHING Introduction Hash addressing or hashing is a well known technique for storing a file of data in such a way that the records can be rapidly retrieved based on a single key. The technique is to generate a memory address bin number from the key and either store the record at that address or plant a link at that address which points to the record. Different keys may generate the same bin number so a method of handling a case where more than one record key maps into the same bin is needed. The most common way of handling these overflows is to chain the records in each bin using a link field in each record to point to the next record in the chain.

The normal hashing technique allows Exact- Match searches on the specified key. That is, to find a record one must know the exact key under which it is filed. Any errors in the key will cause an error in the bin number generated from it and prevent finding the particular record.

For some problems exact keys are not known and so the normal hashing technique does not work. One class of such problems includes those problems where only approximate key values are known. For instance, in a processor connected to a scanning radar performing a track-while-scan algorithm, the measured position of a target found by the radar must be compared with the predicted positions of aircraft tracks developed from previous radar scans, the correct track being associated with the target and the track being updated to reflect the current measured position. The fact that there are errors in the radar measurements and errors in predicting tracks because of aircraft heading, height and velocity changes means that the search for tracks must find those tracks whose predicted positions agree only approximately with the radar measurements.

Hence, we present some techniques for hashing which allow items to be found from approximate key values rather than exact key values. The item keys are considered to be coordinates of points in a finite Euclidean n-space(n is any positive integer) and the searches to be searches for points lying within an ncube of fixed dimensions whose center can be any point in the n-space.

ln the radar example mentioned above, n 3; the dimensions being latitude, longitude and height of aircraft tracks. The dimensions of the search cube reflect the errors in the radar measurements and the maximum maneuvers an aircraft can be expected to make in between radar scans.

Impossibility of Limiting Every Between-Limits Search to One Bin Thus, the problem is'to divide up a finite n-space into bins in such a way that the maximum number of bins intersected by an n-cube of fixed dimensions but arbitrary center is At the same time, we want to the volume of the bins to reduce the likelihood of several points being in the same bin.

One-Dimensional Case Assume n l. The points corresponding to values of keys lie in an interval of some length, L. Assume the between limits searches to be carried out are for points within a fixed distance, d, of an arbitrary point, x. Consider a bin which covers all points greater than or equal toaandlessthanb.Asstuneba 2d. lfxshould equal (a-l-b)/2 thenxd aandx+d band the search interval overlaps the bin [a, b), the bin below and the bin above; three bins at least must be searched. lfb-a z 2dthenfornoxwillx-d a andx+d a b; at most the search interval overlaps only one of the adjacent bins.

If the width of all bins is 2d or more no betweenlimits search with a search interval of 2d or less can covermore than two bins.

A good hash algorithm is as follows. The key value of each item being stored is divided by the width of the search interval, 2d, and the item is put in the bin whose index is the integer part of the quotient. Each betweenlimits search searches two bins. The center-point of the search interval, x, is divided by 2d and the integer part of the quotient is used as the index of the first bin to search. If the fractional part of the quotient is k or more, then the secondbin to search is the bin with the next-higher index; if the fractional part is less than k the second bin to-search is the bin with the next-lower index.

Mapping Many Dimensions Into One Dimension A between-limits search problem where n is greater than one can be converted to a one-dimensional problem. Two bins must be searched for each betweenlimits search. This method has the disadvantage of creating bins with large volumes so in many cases there will be too many items in one bin to permit rapid retrieval.

Assume the between-limits searches to be carried out are for points each of whose coordinates, y lie within a fixed distance, 41,, of the coordinate, x,, of an arbitrary point(l s i n).

Let S, for l s l s n be any set of scaling factors such that the sum of S, d, over all i is less than or equal to one-half.

IfY,, Y Y,, are the n key values ofan item to be stored the item is stored in the bin whose index is the I bin is the bin whose index is the integer part of the sum In exact-match hashing one bin number is generated from the key and to find the desired record only the records in that bin need be examined. The only way to guarantee single-bin searches in between-limits hashing is to map all of the n-space into one bin. This is no cubes overlap bin boundaries. All bins that the cube intersects must be examined for points in the cube so more than one bin needs to be examined in these cases.

of S x, over all i. The second bin is the bin with the nexthigher index if the fi'actional part of the sum is 7% or more while it is the bin with the next-lower index if the fractionalpartislessthan k.

This search will find all desired points since if Ix, y, I

I d, for all i then the sum of S x, and the sum of S,Y,

sum determine what integral parts these desired point sums should have.

The scaling factors, S,, can be quite arbitrary, pro vided the sum of i S, d, is h or less. With a particular choice of factors all stored points in the n-space are projected onto a line (the sum axis) and are binned according to the positions of their projections on this line. A good choice of factors would spread the stored points out so no bin receives too many points.

For a better understanding of the invention reference should be had to the accompanying drawings wherein:

FIG. 1 shows a two-dimensional space with 13 points to be stored;

FIG. 2 illustrates a scaling factor picked so that S S which causes the projections to clump together, and some bins receive many points while others receive none;

FIG. 3 illustrates a scaling factor picked so that S S which causes the projections to spread apart;

FIG. 4 illustrates the bin arrangements for a two space;

FIG. 5 illustrates the algorithm for finding the indices of the intersected bins; and

FIG. 6 illustrates a general scheme for operation of the method of the invention with a computer and storage means.

In some cases data points which are spread apart in n-space will clump severely when projected onto a line of any orientation. This method will not give a good hashing scheme in these cases. FIG. 1 illustrates the hashing problem. FIG. 2 illustrates one conventional approach to solve it, but ends up in a bad clumping of points. FIG. 3 is somewhat better than FIG. 2 because things are spread out more. The method described below is better.

Brick Wall Method In this method a digital computer is programmed so that n-space is divided up into bins in such a way that a search cube can never intersect more than n l bins. The bin volumes are not large so clumping of several points into one bin will be relatively rare. The name brick wall comes from the fact that the pattern of bins resembles the common brick wall.

Let n be the number of dimensions of the search. Let d be the tolerance allowed in the i dimension; that is we want to find points each of whose coordinates, Y,, lie within d, of the i" coordinate of arbitrary search point, x,.

Bins will be indexed with n-dimensional vectors of integers. The n-dimensional integer vector of a bin can be easily transformed to a single integer which becomes its address in memory. For instance, one can weight the components with a set of weights and add the weighted components of the integer vector to obtain a single in teger wherein;

Let (Y Y Y be the n coordinates of a point to be stored. The point is stored in bin (b b b,,)where 12 integer part of Y 2d and i1 b =integer part of (Y /Zd I-kE b )/'i (for 2gz'sn).

FIG. 4 shows the bins for two-space. The length of d and d; are indicated on the figure and the bin indices are indicated inside the bins. Note that a rectangle of length M and height 2d intersects exactly three bins no matter where it is placed on the diagram as long as its sides are oriented with the axes. Also, note that the sums of the bin indices (b; b for each of the three bins it intersects are three successive integers, for example, the dotted rectangle in FIG. 4 intersects bins 44, 54 and 55 with index sums 8, 9, and 10, respectively.

For n dimensions an oriented n-cube with any arbitrary center (x,, x x and side dimensions 2d,, 2, d 2d respectively, will intersect n l bins and the index-sums of the n 1 bins are n l successive integers. This can be easily proven by induction on n.

The algorithm for finding the indices of the intersected bins is charted in FIG. 5. The nomenclature is as follows:

d, tolerance in 1"" dimension .x" =i" coordinate of center of search cube bi =i index of bin j n number of coordinates [x]= greatest integer not more than x.

In words the algorithm is as follows:

1. For each coordinate of the center point of the search divide the coordinate value x by the search tolerance d of the corresponding dimension, subtract unity and divide by two to obtain the transformed coordinate x 2. Truncate the first transformed coordinate x, to an integer to obtain the first index b of the first search bin;

3. Add unity to the first index b of the first search bin to obtain the fust index b of the second search bin;

4. If the munber of dimensions in the search space is two or more then for each of the first two bins;

5. Add the first index b of the bin to the second transformed coordinate x;,, divide by two, and separate the quotient into an integral part [q] and a fractional P 6. Use the integral part [q] of the quotient as the second index h of the bin. If the fractional part is equal to or greater than one-half of unity then use the first index b of the bin as the first index b of the third search bin, and the integral part [q] of the quotient plus one as the second index b of the third bin;

7. If the number of dimensions in the search space is three or more then for each of the first three bins;

8. Add the sum of the first two indices b b of the bin to the third transformed coordinate x divide by three and separate the quotient into an integral part [q] and a fractional part. Use the integral part of the quotient as the third index b of the bin. If the fractional part is equal to or greater than two-thirds of unity then use the first two indices b b of the bin as the first two indices b b of the fourth search bin and the integral part [q] plus one as the third index 12 of the fourth bin.

Ifthe umber of dimensions in the search space is four or more then for each of the first four bins add the sum of the first three indices of the bin to the fourth transformed coordinate, divide by four and separate the quotient into an integral part and a fractional part. Use the integral part of the quotient as the fourth index of the bin. If the fractional part is equal to or greater than three-fourths of unity then use the first three indices of the bin as the first three indices of the fifth search bin and the integral part plus one as the fourth index of the fifth bin.

If the number of dimensions in the Search space is five or more then for each of the first five bins add the sum of the first four indices of the bin to the fifth transformed coordinate, divide by five and separate the quotient into an integral part and a fractional part. Use the integral part of the quotient as the fifth index of the bin. If the fractional part is equal to or greater than fourfifths of unity then use the first four indices of the bin as the first four indices of the sixth Search bin and the integral part plus one as the fifth index of the sixth bin.

If the number of dimensions is six or more then for each dimension past the fifth perform a step similar to the above with certain modifications. For the i" dimension the step is: For each of the first i bins, add the sum of the first i l indices of the bin to the i'" transformed coordinate, divide by i and separate the quotient into an integral part and a fractional part. Use the integral part of the quotient as the 2"' index of the bin. If the fractional part is equal to or greater than i 1 1 of unity then use the first i l indices of the bin as the first i l indices of the i 1" search bin and the integral part plus one as the i'" index of the i+ 1" bin.

At the conclusion there is one more Search bin than the number of dimensions in the search space, each with a number of indices equal to the number of dimensIons.

A typical computer program to encompass the algorithm of FIG. 5 and the description above is set forth in (USA) Fortran (Standard Programmer Language) as follows:

COLUMNS USA) FORTRAN STANDARD PROGRAMMER LANGUAGE) SUBROUTINE GENIND (N, x, INDEX, D)

N=NUMBER OF DIMENSIONS.

x IS AN ARRAY or N REALS.

X(I)=I-TI-I COORDINATE OF CENTER OF SEARCH.

INDEX Is AN ARRAY OF N BY (N+I) INTEGERS.

INDEX (I,J)=I-TII INDEX OF BIN J.

GIvEN N, X AND D GENIND GENERATES THE INDICES OF BINS TO SEARCH IN INDEX.

D IS AN ARRAY OF N REALS.

D(I)=SEARCH TOLERANCE IN ITH DIMENSION.

Do 10 I=1, N

INDEX (I,I)=X(l) INDEX I,2 ==INDEX 1,1 +I

IF N-I 70, 10,20

DO 60 I=2, N

AIMI=IMI DO 30 K==l ,IMI

JSUM==JSUM+INDEX (KJ) SUM=JSUM INDEX (IJ)=QUOT A INDEX=INDEX 1,1

IF (Al"(QUOT-A INDEX -AIMI )60,40,40

INDEX (LIPI =INDEX I,I +I

CONTINUE RETURN END 00 OO 6 COO A SIGMA 5, general purpose digital computer as manufactured by Scientific Data System which now is a division of XEROX Corporation would handle the program control desired by the invention.

The data is stored in memory by the Algorithm as depicted by the program to essentially define the physical bin arrangement of FIG. 4.

FIG. 6 illustrates a computer and storage memory cooperating with an input-output section 62 and a control and arithmetic section 64. The input information from the n Space is aircraft targets 66 being detected by a suitable radar 68 for input to section 62. The method of the invention operates through the section 64 to properly divide the Storage area of the memory 60 and properly coordinate the section 62 to follow the appropriate rules and guidelines.

Conclusions The brick wall method allows a between-limits search to be conducted in n-space by searching n l bins. The bins are relatively small so the probability of a bin containing many points is small. If the number of bins is too large they can be combined as necessary to reduce the number of memory addresses needed.

By mapping n-space onto a line in a special way the number of bins to be searched for a between-limits Search can be reduced to two. Unfortunately this method creates rather large bins which may contain many points in certain situations.

Therefore, in accordance with the Patent Statutes I have described the preferred embodiment of my method to coordinate a general purpose digital computer to enhance storage and retrieval of data therewithin in reduced times. However, it is to be understood that my invention is not to be limited thereto or thereby, but the scope of the invention is defined in the appended claims.

What is claimed is:

1. A method for the storage and retrieval of information which comprises the Steps of controlling an information storage mechanism to a. divide n space into bins in such a way that any search cube of a preselected size intersects at most n l bins,

b. index the bins with n-dimensional vectors of integers,

c. transform the n-dimensional vector of each bin designating vector into a Single integer to provide a bin address, and

d. pass information to and from the bins according to the assumption that a between limits Search is to be carried out for points in said search cube.

2. A method according to claim 1 where the n space is divided so that the Search cube intersects two bins at most according to a scaling factor S, chosen so that the Sum of [S id, is k or less where d is a fixed distance defining the between limits Search field and i is an integer l s i s n.

3. A method according to claim 1 which includes the step of detecting aircraft positions by radar to provide the input information, and where step (a) divides the n space into a common brick wall pattern of bins.

4. A method for the storage and retrieval of information representing random points in space which comprises the Steps of controlling a general purpose digital computer according to the following steps;

3 ,68 1 ,78 l 7 8 '1. selecting a center point, and for each coordinate 6. A method according to claim where if the of the center point dividing the coordinate value number of dimensions in the space is four or more for by a predetermined search tolerance, subtracting each of the first four bins unity, and dividing by two to obtain a ansfo ed add the sum of the first three indices of the bin to the coordinate, 5 fourth transformed coordinate, divide by four and 2. truncate the first transformed coordinate to an in- Separate th u ti nt into an integral part and a teger to obtain the first index of the first bin, f a tional part, and 3. add unity to the first index of the first bin to obtain use th i t l art of the quotient as the fourth th firs in Of the Sewn? index of the bin, and if the fractional part is equal 4. for each of the first two bins add the first index of 10 to or greater h h i th f it th us the bin to the Second transfomed coordinate the first three indices of the bin as the first three invide and e e the quotient as h dices of the fifth bin and the integral part plus one second mdex of the 13111, and 1f the fractional part is as the fourth index of the fifth bin equal to or greater than one'half of unity then use 7. A method according to claim 6 where if the e index of bin as the first of the number of dimensions is i or more then for each of the lhll'd bin, and the integral part of the quotient plus first bins one as the second index of the third bin.

5. add the sum of the first r l indices of the bin to A method t l to elalm 4 where If the the i" transformed coordinate and divide by i and number of dimensions in the space 1s three or more, for Separate the quotient into an integral part and a each of the first three bins: fractional p and add the sum of the first two indices of the bin to the third transformed coordinate, divide by three and 6. use the integral part of the quotient as the 1 index separate the quotient into an integral part and a fractional part, and

of the bin, and if the fractional part is equal to or greater than i 1 1" of unity then using the first i 1 indices of the bin as the first i l indices of the i th t al art of the uotient as the third index f the bi ne anel if the fra tional part is equal to or 1; l Search h Integral pan plus 1 as the greater than two-thirds of unity then use the first Z Index efthe and two indices of the bin as the first two indices of the ep at steps (5) and (6) sequentially until all fourth bin and the integral part plus one as the dimensions have been completed. third index of the fourth bin.

UNITED STATES PATENT OFFICE CERTIFICATE OF CORRECTION Patent NO- 3,681, 781 Dated August 1, 1972 Inventoflx) Kenneth E. Batcher It is certified that error appears in the above-identified patent and that said Letters Patent are hereby corrected as shown below:

Column 3, line 65, change "b to -b Column 4, line 12, change "2, d to 2d Column 4, line 29, change "x to -x Signed and sealed this 23rd day of January 1973.

(SEAL) Attest:

EDWARD M.FLETCHER,JR.

ROBERT GOTTSCHALK Attesting Officer Commissioner of Patents FORM PO-1OSO (10-69) USCOMM-DC 60376-PG9 us. GOVERNMENT PRINTING OFFICE. 1959 0366-334 

1. A method for the storage and retrieval of information which comprises the steps of controlling an information storage mechanism to a. divide n space into bins in such a way that any search cube of a preselected size intersects at most n + 1 bins, b. index the bins with n-dimensional vectors of integers, c. transform the n-dimensional vector of each bin designating vector into a single integer to provide a bin address, and d. pass information to and from the bins according to the assumption that a between limits search is to be carried out for points in said search cube.
 2. A method according to claim 1 where the n space is divided so that the search cube intersects two bins at most according to a scaling factor Si chosen so that the sum of Si di is 1/2 or less where di is a fixed distance defining the between limits search field and i is an integer 1 < or = i < or = n.
 2. truncate the first transformed coordinate to an integer to obtain the first index of the first bin,
 3. add unity to the first index of the first bin to obtain the first index of the second bin,
 3. A method according to claim 1 which includes the step of detecting aircraft positions by radar to provide the input information, and where step (a) divides the n space into a common brick wall pattern of bins.
 4. A method for the storage and retrieval of information representing random points in space which comprises the steps of controlling a general purpose digital computer according to the following steps;
 4. for each of the first two bins add the first index of the bin to the second transformed coordinate, divide by two, and separate the quotient as the second index of the bin, and if the fractional part is equal to or greater than one-half of unity then use the first index of the bin as the first index of the third bin, and the integral part of the quotient plus one as the second index of the third bin.
 5. A method according to claim 4 where if the number of dimensions in the space is three or more, for each of the first three bins: add the sum of the first two indices of the bin to the third transformed coordinate, divide by three and separate the quotient into an integral part and a fractional part, and use the integral part of the quotient as the third index of the bin, and if the fractional part is equal to or greater than two-thirds of unity then use the first two indices of the bin as the first two indices of the fourth bin and the integral part plus one as the third index of the fourth bin.
 5. add the sum of the first i - 1 indices of the bin to the ith transformed coordinate and divide by i and separate the quotient into an integral part and a fractional part, and
 6. use the integral part of the quotient as the ith index of the bin, and if the fractional part is equal to or greater than i -1 iths of unity then using the first i - 1 indices of the bin as the first i - 1 indices of the i + 1th search bin and the integral part plus 1 as the ith index of the i + 1th bin, and
 6. A method according to claim 5 where if the number of dimensions in the space is four or more for each of the first four bins add the sum of the first three indices of the bin to the fourth transformed coordinate, divide by four and separate the quotient into an integral part and a fractional part, and use the integral part of the quotient as the fourth index of the bin, and if the fractional part is equal to or greater than three-fourths of unity then use the first three indices of the bin as the first three indices of the fifth bin and the integral part plus one as the fourth index of the fifth bin.
 7. A method according to claim 6 where if the number of dimensions is i or more then for each of the first i bins
 7. repeat steps (5) and (6) sequentially until all dimensions have been completed. 